# IVRE: Interactive Visual Reasoning under Uncertainty

An environment for evaluating artificial agents' reasoning ability under uncertainty. IVRE is an interactive environment featuring rich scenarios centered around **Blicket** detection. Agents in IVRE are placed into environments with various ambiguous action-effect pairs and asked to figure out each object's role. Agents are encouraged to propose effective and efficient experiments to validate their hypotheses based on observations and gather more information. The game ends when all uncertainties are resolved or the maximum number of trials is consumed.

#### Benchmark Link
<!-- info: Provide a link to the dataset: -->
<!-- width: half -->
https://sites.google.com/view/ivre/home

#### Benchmark Card Authors
<!-- info: Select **one role per** Data Card Author:

(Usage Note: Select the most appropriate choice to describe the author's role
in creating the Data Card.) -->
<!-- width: half -->
- Manjie Xu (Beijing Institute of Technology)
- Guangyuan Jiang (Peking University)

#### Organizations
<!-- scope: telescope -->
<!-- info: Provide the names of the institution or organization responsible
for publishing the dataset: -->
- School of Computer Science \& Technology, Beijing Institute of Technology
- National Key Laboratory of General Artificial Intelligence, BIGAI
- Institute for AI, Peking University


#### Industry Type
- Academic - Tech

#### Contact
- manjietsu@gmail.com
- jgy@stu.pku.edu.cn


### Dataset Owners
#### Team(s)
<!-- scope: telescope -->
<!-- info: Provide the names of the groups or team(s) that own the dataset: -->
The Cognitive Reasoning (CoRe) Lab, PKU

#### Author(s)
<!-- scope: microscope -->
<!-- info: Provide the details of all authors associated with the dataset:

(Usage Note: Provide the affiliation and year if different from publishing
institutions or multiple affiliations.) -->
- Manjie Xu
- Guangyuan Jiang
- Wei Liang
- Chi Zhang
- Yixin Zhu

#### Fundings
<!-- scope: periscope -->
<!-- width: full -->
<!-- info: Provide a short summary of programs or projects that may have funded
the creation, collection, or curation of the dataset.

Use additional notes to capture any other relevant information or
considerations. -->

This work is supported in part by the National Key R\&D Program of China (2022ZD0114900) and the Beijing Nova Program.

## Overview
#### Benchmark Subject(s)
<!-- scope: telescope -->
<!-- info: Select ***all applicable**** subjects contained the dataset: -->

- Synthetically generated data

#### Details

IVRE is built under the OpenAI Gym API. Each instance is either synthetic figures of 3D geometries with rich properties generated by Blender or symbolic representations of Blicket objects. The objects' attributes vary in shape (cube, sphere, or cylinder), material (metal or rubber), and color (gray, red, blue, green, brown, cyan, purple, or yellow). We signal activation of the Blicket machine by lighting it up.

#### Benchmark Snapshot

Category | Data
--- | ---
Size | inf
Number of objects | 49
Number of objects in each episode | 10
Possible blickets in each episode | 4
Contexts in each episode | 10 
Trials in each episode | 6
Observation Space (Symbol) | 10 (9 + 1)
Observation Space (Pixel) | 224 * 224 * 3 
Action Space | 18 (9 + 9)

#### Additional notes
In each episode, up to 10 panels are available where 4 panels are context panels and 6 panels are trial panels. There are at most $$ \binom{48}{9} \times \sum_{i = 1}^4 \binom{10}{i} \approx 2 \times 10^{10} $$ unique episodes with different causal structures (Blicket assignments).

### Sensitivity of Data

N/A

#### Security and Privacy Handling

N/A

### Dataset Version and Maintenance
#### Maintenance Status
<!-- scope: telescope -->
<!-- info: Select **one:** -->
**Regularly Updated** - New versions of the dataset have been or will continue to be made available.

#### Version Details
<!-- scope: periscope -->
<!-- info: Provide details about **this** version of the dataset: -->
**Current Version:** 1.0

**Last Updated:** 06/2023

**Release Date:** 06/2023

#### Maintenance Plan
<!-- scope: microscope -->
<!-- info: Summarize the maintenance plan for the dataset:

Use additional notes to capture any other relevant information or
considerations. -->
New versions of the benchmark will be made available. The benchmark will be updated when new features are added or bugs are fixed.

### Intended Use
#### Dataset Use(s)
<!-- scope: telescope -->
<!-- info: Select **one**: -->
- Safe for research use

#### Suitable Use Case(s)
<!-- scope: periscope -->
<!-- info: Summarize known suitable and intended use cases of this dataset.
Use additional notes to capture any specific patterns that readers should
look out for, or other relevant information or considerations. -->
Serving as a testbed, IVRE is supposed to be used to evaluate interactive reasoning under uncertainty of today's state-of-the-art artificial agents with visual input from the rendering engine and with the underlying symbolic representation of the environment. IVRE can also be used for human study to investigate the cognitive mechanism of human reasoning under uncertainty.

## Access, Rentention, & Wipeout
#### Access Type

- External - Open Access

### Use in ML or AI Systems
#### Benchmark Use(s)
<!-- scope: telescope -->
<!-- info: Select **all applicable** -->

- Training
- Testing
- Validation
- Development